Incremental Parameter Estimation of Stochastic Context-Free Grammars without Retention of the Sentence Corpus

نویسندگان

  • Brent Heeringa
  • Tim Oates
چکیده

Stochastic context free grammars (SCFGs) are often used to represent the syntax of natural languages. Most algorithms for learning these grammars from data require a corpus of sentences to be stored and repearedly processed. We are interested in how embedded agents might learn the syntax of natural languages from exposure to utterances over long periods of time. In this context, the memory and computational requirements of existing algorithms for learning SCFGs from data are prohibitive. We present an online algorithm for learning the parameters of SCFGs that computes summary statistics from sentences as they are observed. The algorithm thus requires a xed amount of space regardless of the number of sentences it processes. Despite the fact that it uses much less information than the Inside-Outside algorithm, our algorithm performs almost as well.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Two Algorithms for Learning the Parameters of Stochastic Context-Free Grammars

Stochastic context-free grammars (SCFGs) are often used to represent the syntax of natural languages. Most algorithms for learning them require storage and repeated processing of a sentence corpus. The memory and computational demands of such algorithms are illsuited for embedded agents such as a mobile robot. Two algorithms are presented that incrementally learn the parameters of stochastic co...

متن کامل

Studying impressive parameters on the performance of Persian probabilistic context free grammar parser

In linguistics, a tree bank is a parsed text corpus that annotates syntactic or semantic sentence structure. The exploitation of tree bank data has been important ever since the first large-scale tree bank, The Penn Treebank, was published. However, although originating in computational linguistics, the value of tree bank is becoming more widely appreciated in linguistics research as a whole. F...

متن کامل

Parsing with the Shortest Derivation

Common wisdom has it that the bias of stochastic grammars in favor of shorter derivations of a sentence is harmful and should be redressed. We show that the common wisdom is wrong for stochastic grammars that use elementary trees instead of context-free rules, such as Stochastic Tree-Substitution Grammars used by Data-Oriented Parsing models. For such grammars a non-probabilistic metric based o...

متن کامل

A Probabilistic Parser and Its Application

We describe a general approach to the probabilistic parsing of context-free grammars. The method integrates context-sensitive statistical knowledge of various types (e.g., syntactic and semantic) and can be trained incrementally from a bracketed corpus. We introduce a variant of the GHR contextfree recognition algorithm, and explain how to adapt it for e cient probabilistic parsing. In splitcor...

متن کامل

Stochastic Inversion Transduction Grammars, with Application to Segmentation, Bracketing, and Alignment of Parallel Corpora

We introduce (1) a novel stochastic inversion transduction grammar formalism for bilingual language modeling of sentence-pairs, and (2) the concept of bilingual parsing with potential application to a variety of parallel corpus analysis problems. The formalism combines three tactics against the constraints that render finite-state transducers less useful: it skips directly to a context-free rat...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2001